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Abstract —Due to its attractive properties, generalized fre¬ 
quency division multiplexing (GFDM) is recently being discussed 
as a candidate waveform for the fifth generation of wireless 
communication systems (5G). GFDM is introduced as a gen¬ 
eralized form of the widely used orthogonal frequency division 
multiplexing (OFDM) modulation scheme and since it uses only 
one cyclic prefix (CP) for a group of symbols rather than a CP per 
symbol, it is more bandwidth efficient than OFDM. In this paper, 
we propose novel transceiver structures for GFDM by taking 
advantage of the particular structure in the modulation matrix. 
Our proposed transmitter is based on modulation matrix sparsifl- 
catlon through application of fast Fourier transform (FFT) to re¬ 
duce the implementation complexity. A unified receiver structure 
for matched filter (MF), zero forcing (ZF) and minimum mean 
square error (MMSE) receivers is also derived. The proposed 
receiver techniques harness the special block clrculant property 
of the matrices Involved in the demodulation stage to reduce 
the computational cost of the system implementation. We have 
derived the closed forms for the ZF and MMSE receiver filters. 
Additionally, our algorithms do not incur any performance loss 
as they maintain the optimal performance. The computational 
costs of our proposed techniques are analyzed in detail and are 
compared with the existing solutions that are known to have 
the lowest complexity. It is shown that through application of 
our transceiver structure a substantial amount of computational 
complexity reduction can be achieved. 

I. Introduction 

O EDM has been the technology of choice in wired and 
wireless systems for years, ID-EI. The advent of the 
fifth generation of wireless communication systems (5G) and 
the associated focus on a wide range of applications from 
those involving bursty machine-to-machine (M2M) like traffic 
to media-rich high bandwidth applications has led to a re¬ 
quirement for new signaling techniques with better time and 
frequency containment than that of OFDM. Hence, a plethora 
of waveforms are coming under the microscope for analysis 
and investigation. 

The limitations of OFDM are well documented. OFDM 
suffers from large out-of-band emissions which not only have 
interference implications but it also can reduce the potential 
for exploiting non-contiguous spectrum chunks through such 
techniques as carrier aggregation. For future high bandwidth 
applications this can be a major drawback. OFDM also has 
high sensitivity to synchronization errors especially carrier 
frequency offset (CFO). As a case in point, in multiuser uplink 
scenarios where OFDMA is utilized, in order to avoid the 
large amount of interference caused by multiple CFOs as 
well as timing offsets, stringent synchronization is required 
which in turn imposes a great amount of overhead to the 
network. This overhead is not acceptable for lightweight M2M 
applications for example. The presence of multiple Doppler 
shifts and propagation delays in the received uplink signal at 


the base station (BS) results in some residual synchronization 
errors and hence multiuser interference (MUI), a. The MUI 
problem can be tackled with a range of different solutions that 
are proposed in la-ci. However, these lead to an increased 
receiver computational complexity. Thus, one of the main 
advantages of OFDM, i.e., its low complexity, is lost. The 
challenge therefore is to provide waveforms with more relaxed 
synchronization requirements and more localized signals in 
time and frequency to suit future 5G applications, without the 
penalty of a more complex transceiver. 

There are many suggestions on the table as candidate 
waveforms M-M- In general, all of these signaling methods 
can be considered as filter bank multicarrier (FBMC) systems. 
They can be broadly broken into two categories, those with 
linear pulse shaping im, ca and those with circular pulse 
shaping, ifSl- ITOI . The former signals with linear pulse shaping 
have attractive spectral properties, 113 . In addition, these 
systems are resilient to the timing as well as frequency errors. 
However, the ramp-up and ramp-down of their signal which 
are due to the transient interval of the prototype filter result 
in additional latency issues. In contrast, FBMC systems with 
circular pulse shaping remove the prototype filter transients 
thanks to their so called tail biting property, IS). The waveform 
of interest in this paper is known as generalized frequency 
division multiplexing (GFDM) and it can be categorized as an 
FBMC system with circular pulse shaping. The focus of the 
paper, more specifically, is on the design of low complexity 
transceivers for GFDM. 

GFDM has attractive properties and as a result has recently 
received a great deal of attention. One of the main attractions 
of GFDM is that it is a generalized form of OFDM which 
preserves most of the advantageous properties of OFDM 
while addressing its limitations. As Datta and Fettweis have 
pointed out in iflTl . GFDM can provide a very low out-of- 
band radiation which removes the limitations of OFDM for 
carrier aggregation. It is also more bandwidth efficient than 
OFDM since it uses only one cyclic prefix (CP) for a group 
of symbols in its block rather than a CP per symbol as is the 
case in OFDM. Through circular filtering, GFDM removes 
the prototype filter transient intervals and hence the latency. 
Additionally, its special block structure makes it an attractive 
choice for the low latency applications like loT and M2M, 
G3. Filtering the subcarriers using a well-designed prototype 
filter limits the intercarrier interference (ICI) only to adjacent 
subcarriers which reduces the amount of leakage between 
subcarriers and increases the resiliency of the system to CFO 
as well as narrow band interference. In other words, GFDM 
has robustness to synchronization errors. As Michailow et al 
report in iflSl . GFDM is also a good match for multiple input 
multiple output (MIMO) systems. 
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The advantages of GFDM come at the expense of an 
increased bit error rate (BER) compared with OFDM. This 
degradation is due to the fact that GFDM is a non-orthogonal 
waveform. Consequently, non-orthogonality of the neighbor¬ 
ing subcarriers and time slots results in self-interference. To 
tackle this self-interference, matched filter (MF), zero forcing 
(ZF) or minimum mean square error (MMSE) receivers can 
be derived m. Since, the MF receiver cannot completely 
remove the ICI, ZF receiver can be utilized. However, due to 
its noise enhancement problem, ZF receiver incurs some BER 
performance loss. Thus, the MMSE approach can be chosen to 
reduce the noise enhancement effect and maximize the signal- 
to-interference plus noise ratio (SINR). As MF, ZF and MMSE 
receivers involve large matrix inversion and multiplication 
operations, they demand a large computational complexity that 
makes them inefficient for practical implementations. As an 
alternative solution, Datta et al, El, take a time domain suc¬ 
cessive interference cancellation approach. This solution can 
completely remove the effect of the self-interference. However, 
that solution is a computationally exhaustive procedure. In a 
more recent work from the same research group, Gaspar et al, 
El, take advantage of the sparsity of the pulse shaping filter 
in frequency domain to perform the interference cancellation 
in the frequency domain and hence further reduce the compu¬ 
tational complexity of the receiver. Even though the solutions 
that are based on the results of El and El successive 
interference cancellation can remove the self-interference, they 
can incur error propagation problems. Recently, Matthe et al, 
El, have proposed a fast algorithm to calculate the ZF and 
MMSE receiver filters. Their approach is based on the Gabor 
transform structure of GFDM. Although matrix inversion is 
circumvented multiplication of the ZF and MMSE matrices 
to the received signal is a bottle-neck in this approach as the 
matrix to vector multiplication is a computationally expensive 
operation. 

In this paper, we design a low complexity transceiver 
structure for GFDM and therefore improve on the existing 
approaches. The special structure of the modulation matrix is 
utilized to reduce the complexity of the transmitter. Compared 
with the existing GFDM transmitter f20l . so far known to 
have the lowest complexity, our proposed transmitter structure 
is more computationally efficient. Based on the lessons that 
we learned from ICI cancellation in uplink OFDMA systems 
with interleaved subcarrier allocation, Q, we are able to 
substantially reduce the complexity of the ZF and MMSE 
receivers compared with the low complexity receiver structure 
that is proposed in El- We propose a unified structure 
for the MF, ZF and MMSE receivers. This unified receiver 
structure is beneficial as only the filter coefficients need to 
be changed for implementation of different receivers. These 
coefficients can be saved on memory and be used if needed 
in different scenarios. For instance, ZF receiver can be used 
instead of MMSE one at high signal-to-noise ratios (SNRs). 
As our techniques are direct and no approximation is involved, 
our proposed receivers do not incur any performance loss 
compared with the optimal MF, ZF and MMSE receivers. 
Another advantage of our receiver structure with respect to 
interference cancellation receivers is that it is not iterative 


and hence the computations can run in parallel which can 
in turn reduce the overall processing delay of the system. As 
our proposed transceiver structure is based on sparsification of 
the matrices that are involved, it also provides savings in the 
memory requirements of the system. 

The rest of the paper is organized as follows. Section HI] 
presents the GFDM system model. Sections Hill and HVl include 
the design and implementation of our proposed GFDM trans¬ 
mitter and receiver structures, respectively. The computational 
complexity of our transceiver pair is analyzed in Section |V] 
Finally, the conclusions are drawn in Section |Vl| 

Notations: Matrices, vectors and scalar quantities are de¬ 
noted by boldface uppercase, boldface lowercase and normal 
letters, respectively. [A]m „ and [a]„ represent the element in 
the m**' row and n**' column of A and the element of 
a, respectively and A^^ signifies the inverse of A. 1m and 
Om are the identity and zero matrices of the size M x M, 
respectively. D = diag(a) is a diagonal matrix whose diagonal 
elements are formed by the elements of the vector a and 
C = circ(a) is a circulant matrix whose first column is 
a. The round-down operator [-J, rounds the value inside to 
the nearest integer towards minus infinity. The superscripts 
(•)^ and (•)* indicate transpose, conjugate transpose 
and conjugate operations, respectively. Finally, 5{-), @ and 
mod N represent the Dirac delta function, M-point circular 
convolution and modulo-N operations, respectively. 

II. System Model for GFDM 

We consider a GFDM system with the total number of 
N subcarriers that includes M symbols in each block. In a 
GFDM block, M symbols overlap in time. Therefore, we call 
M, overlapping factor of the GFDM system. The MN x 1 
vector d = [dj,..., contains the complex data 

symbols of the GFDM block where the M x 1 data vector 
di = [di(0),..., di{M — 1)]"'’ contains the data symbols to be 
transmitted on the i**' subcarrier. To put it differently, dfim) is 
the data symbol to be transmitted at the time slot on the 
subcarrier. The data symbols are taken from a zero mean 
independent and identically distributed (i.i.d) process with the 
variance of unity. In GFDM modulation, the data symbols to 
be transmitted on the subcarrier are first up-sampled by 
the factor of N to form an impulse train 

M-l 

Si{n) = di{k)6{n — kN), n = 0,..., NM — 1. (1) 

fc=0 

Then, = [si(0),..., sfiMN — 1)]"^ is circularly convolved 
with the prototype filter and up-converted to its corresponding 
subcarrier frequency. After performing the same procedure for 
all the subcarriers, the resulting signals are summed up to form 
the GFDM signal x{n), ifl^ . 

N-l M-l 

x{ti) ^ ^ ^ ^ mod MN}^^ ^ , (^) 

2^0 m—0 

where gi is the coefficient of the prototype filter. 

Putting together all the transmitter output samples in an 
MN X 1 vector x = [a;(0),... ,x{MN — 1)]^, the GFDM 
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Fig. 1. Baseband block diagram of a GFDM transceiver system. 


signal can be represented as multiplication of a modulation 
matrix A of size MN x MN to the data vector d, ifT^ . 

X = Ad. (3) 

Modulation matrix A encompasses all signal processing 
steps involved in modulation. Let g = [po, ■ • ■, gMN-if^ hold 
all the coefficients of the pulse shaping/prototype filter with 
the length MN, the elements of A can be represented as, 

[A]„m = 9{{n-mN) mod MN}^^ w LmJ . (4) 

Based on the equations dD to (HI, the matrix A can be written 
as 

A=[g SiQ ... Sn-iQ], ( 5 ) 

where g is an MN x M matrix whose first column contains 
the samples of the prototype filter g and its consecutive 
columns are the copies of the previous column circularly 
shifted by N samples. £i = diag{[e^,..., is an 

MN X MN diagonal matrix whose diagonal elements are 
comprised of M concatenated copies of the vector = 
[1, e-^^,..., 

GFDM systems use frequency domain equalization (FDE) 
to tackle the wireless channel impairments and reduce the 
channel equalization complexity. In those systems, a CP which 
is longer than the channel delay spread is added to the 
beginning of the GFDM block to accommodate the channel 
transient period. If Acp is the CP length, the last Acp 
elements of the vector x are appended to its beginning in 
order to form the transmitted signal vector x whose length 
is MN + Acp. Let h = [/ig, • • •, ^ATch-i]"*" ^e the channel 
impulse response. Thus, the CP length Acp needs to be longer 
than the channel length Nch- The received signal which has 
gone through the channel, after CP removal can be shown as 

r = Hx + If, (6) 

where v is the complex additive white Gaussian noise 
(AWGN) vector, i.e, v ^ CA/’(0, CTjy^lMAr), is the noise 
variance, H = circ{h} and h is the zero padded version 
of h to have the same length as x. Due to the fact that H 
is a circulant matrix, an FDE procedure can be performed 
to compensate for the multipath channel impairments. With 


the assumption of having perfect synchronization and channel 
estimates, the equalized signal can be obtained as 

y = (7) 

where Ymn is MA-point normalized discrete Eourier trans¬ 
form (DET) matrix and 'H~^ is a diagonal matrix whose 
diagonal elements are reciprocals of the elements of the vector 
obtained from taking MA-point DET of the zero padded 
version of h, viz., h. The vector y = [j/g, • ■ • ,2/MAr-i]"'" is 
the output of the EDE block. 

In order to suppress or remove the ICI due to non¬ 
orthogonality of the subcarriers and estimate the transmitted 
data vector d from the equalized signal vector, three linear 
GEDM receivers; namely, ME, ZE and MMSE detectors are 
considered in this paper. 

As it was discussed in, na, the transmitted symbols can 
be recovered through match filtering 

dMF = A^y. (8) 

However, ME receiver cannot completely remove the ICI. 

Hence, ZE solution can be utilized to completely eliminate 
the ICI that is caused by non-orthogonality of the subcarriers. 
The ZE estimate of the transmitted data vector can be found 
as 

dzF = (Af^A)-iAHy. (9) 

Since (A^A)”^A^ can have large values, its multiplication 
to y can result in noise enhancement. This noise amplification 
problem can be taken care of by utilizing the MMSE receiver 

dMMSE = (A^A -I- cry^lM7v)~^A^y. (10) 

Fig. m depicts the baseband block diagram of a GEDM 

transceiver when we have perfect synchronization in time 
and frequency between the transmitter and receiver. Eig. [T| 
summarizes the modulation and demodulation process that 
is discussed above. It is worth mentioning that p„’s for 
n = 0,..., MN — 1 are the prototype filter coefficients and 
Qn’s are the receiver filter coefficients which can be taken 
from the coefficients of ME, ZE or MMSE receiver filter. As 
it was mentioned in Section |I] GEDM is a type of filter bank 
multicarrier system with circular pulse shaping. Therefore, 
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Fig. 2. Concatenation of|(a)|and|(b)|show the implementation of the proposed GFDM transmitter. 


GFDM transmitter and receiver can be thought of as a pair 
of synthesis and analysis filter banks, respectively. 

From equations Q and ([8]l to (fTOl l. one realizes that direct 
matrix multiplications and inversions that are involved, de¬ 
mand a very large computational complexity as all the matrices 
are of the size MN x MN, with N being usually large, and 
such complexity may not be affordable for practical systems. 
Therefore, in the remainder of this paper, low complexity 
techniques will be proposed that can substantially reduce 
the computational cost of the synthesis and analysis filter 
banks that are shown in Fig. [T] while maintaining the optimal 
performance. 

III. Proposed GFDM Transmitter 

This section presents our proposed low complexity GFDM 
transmitter design and implementation. In the following sub¬ 
sections, we will show how the synthesis filter bank of Fig. [T] 
can be simplified to have a very low computational load. 

A. GFDM transmitter design 

Starting from (O, one can realize that direct multiplication 
of the matrix A to the data vector d is a complex operation 
which demands (MA)^ complex multiplications. Therefore, 
complexity will be an issue for practical systems as the 
number of subcarriers and/or the parameter M increases. 
Accordingly, a low complexity implementation technique for 
GFDM transmitter has to be sought. To this end, equation Q 
can be written as 

X = Ad = AjrH/Tfcd, (11) 

where is the MN x MN normalized block DFT matrix 
that includes M x M submatrices and 

n, i = 0,..., A — 1. Validity of equation (flTb is based on the 
fact that = Ima- As it is derived in Appendix lAl the 

resulting matrix from multiplication of the block DFT matrix 
jFf, into A^ is sparse and it is comprised of the prototype 
filter coefficients scaled by \/N. From equation (fTTT i. it can 
be inferred that is also sparse since it is the 

conjugate transpose of Hence, our strategy allows 

us to make the matrix A sparse and real as the prototype 
filter is usually chosen as a real filter. Due to (fTTT i and the 


definition of Tb, /^bd can be implemented by performing M 
DFT operations of size A on the data samples, i.e., one per 
GFDM symbol. Let d = iF^d = [dj,..., d^_]^]^ where the 
M X 1 vector d^ = ..., di{M — 1)]"^ contains the 

output of each DFT block, then (fTTTi can be rearranged as 

Af-l 

X = r^d = ^ rf d„, (12) 

i=0 

where k = (A — i) mod A. As discussed in Appendix |B] 
the M X MN matrices F^’s have only M non-zero columns 
and the sets of those column indices are mutually exclusive 
with respect to each other. As a result, F^ d^ will be a sparse 
vector with only M non-zero elements located on the positions 
K, k-I-A, ..., k-\-{M—1)N. On the basis of the derivations that 
are presented in Appendix [Al the non-zero elements of Ff d^ 
can be obtained from M -point circular convolution of d^ with 
the polyphase component of the prototype filter that is 
scaled by sfN- Therefore, defining the non-zero elements of 
Ff d^ as the vector x^^ = [a;^, Xk.+n, ■ ■ ■, Xi^+(m-i)n]^, we 
get 

(13) 

where g^ = VAg„. 

B. GFDM transmitter implementation 

In this subsection, implementation of the designed GFDM 
transmitter in Section IIII-AI is discussed. From the equations 
dnii to (113b . GFDM modulation, based on our design, can be 
summarized into two steps. 

1) M number of A-point DFT operations, i.e., application 
of A-point DFT to each individual GFDM symbol 
which includes A subcarriers. This can be efficiently 
implemented by taking advantage of the fast Fourier 
transform (FFT) algorithm. 

2) A number of M-point circular convolution operations. 

Therefore, the first and second steps of our GFDM trans¬ 
mitter can be implemented by cascading the block diagrams 
shown in Fig. [2] |(a)| and |(b)| respectively. The blocks P/S 
convert the parallel FFT outputs to serial streams. All the 
commutators shown in Fig. |2] turn counter clockwise. Both 
commutators located on the right hand side of the Fig. [2]|(a)| 
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and |(b)| turn after one sample collection. However, the one 
located on the left hand side of |(b)| turns by one position 
after sending M samples to each M-point circular convolution 
block. 


Appendix multiplication of by the block DFT matrix 
results in a sparse matrix. Due to the fact that = ImNj 

similar to the transmitter tequation ifTTIl l. equation ([8]l can be 
written as 


IV. Proposed GFDM Receiver 

In this section, we derive low complexity ZF and MMSE 
receivers for GFDM systems. It is worth mentioning that 
our solutions are direct and hence lower complexity of these 
receivers comes for free as they do not result in any per¬ 
formance loss, thanks to the special structure of the matrix 
A^A. The characteristics of A^A will be discussed in the 
next subsection and then we will derive our proposed receivers 
on the basis of those traits. 

A. Block-diagonalization of the matrix A^A 

The key idea behind our proposed GFDM receiver tech¬ 
niques is to take advantage of the particular structure of the 
matrix A^A which is present in both ZF and MMSE receiver 
formulations. Using (|5]l, one can calculate A^A and find out 
that it has the following structure 



■ 

g^£ig 

• g^£N-iG' 

a“a = 

g^£^g 

g^g 

■ g^£N-2g 


.t/ 

/^H c*H 

g^g 


(14) 

Erom the definition of vector e^, it can be straightforwardly 
perceived that ^ and hence £^_i — Si- Therefore, 
the columns of A^A as shown in (fT4li are circularly shifted 
with respect to each other. Accordingly, A^A is a block- 
circulant matrix with blocks of size M x M. Eollowing a 
similar line of derivations as in El] and a, A^A can be 
expanded as follows 

A^A = TfVTb, (15) 

where X> is an MN x MN block-diagonal matrix, X> = 
diag{X>o,..., and 'Dfs are M x M block matrices. 

Erom (ffSl l. T> can be derived as 

'D = Tb{A^A)Tf. (16) 

As it is explained in Appendix IbI T>fs can be derived from 
polyphase components of the prototype filter. 

X>i = Acirc{g;.c@g„}, (17) 

where k = {N — i) mod N, g^ is the i**' polyphase com¬ 
ponent of g and gi = [gi,g^+(M-l)N,■■■,9^+N^ is its 
circularly folded version. As (fTTl l highlights, 'Dfs are all real 
and circulant matrices. 

B. Low complexity MF receiver 

Based on equation (|8]l, direct implementation of ME re¬ 
ceiver involves a matrix to vector multiplication which has 
the computational cost of (MA)^ complex multiplications. 
This procedure becomes highly complex for large values of 
N and/or M which is usually the case. As discussed in 


dMF = :Ff:FbA^y 

= (18) 

where F is a sparse matrix with only NM'^ non-zero elements 
that are the scaled version of the prototype filter coefficients. 
Closed form of F = [Fq ,..., is derived in Ap¬ 

pendix 13 and it is shown that the matrix is real valued and 
comprised of the prototype filter elements. Non-zero columns 
of the M X MN block matrices F^’s are circularly shifted 
copies of each other. Hence, multiplication of F^ and y is 
equivalent to M-point circular convolution of M equidistant 
elements of y starting from the position and circularly 
folded version of the k**' polyphase component of g scaled 
by '/N, viz., Usually, the prototype filter coefficients 

are real-valued. Thus, F is real-valued. Multiplication of 
to the vector Fy can be implemented by applying M number 
of A-point IDET operations. Let y = Fy = [yj,..., y^_]^]^ 
and y« = [y^, J/k+at, ..., 2/K+(M-i)Ar]'^- Therefore, we have 

Si = r*y = VK@yK, (19) 

where v„ = V^g„. Finally, the MF estimates of d can be 
obtained as 

dMF = T^y- (20) 


C. Low complexity ZF receiver 
Inserting (fTSl) into (|9l), we get 

dzF = (21) 

Multiplication of matrix A^ to the vector y is the first 
source of computational burden in ZF receiver which has 
computational cost of (MA)^. However, this complexity can 
be reduced by taking advantage of the sparsity of the matrix 
F = J^bA^ as it was suggested in the previous subsection. 
Equation (fT^ can be written as yt = F^y = \/Acirc{gK}yK- 
Let y = X>^^y = [yj,..., y^_i]^ where 

y, = VAD“^circ{g;.c}yK- (22) 


Therefore, from rearranging equation dnii as = 

Acirc{g„}circ{gK} and inserting it into (l22li . we have 


y* = -^(circIg^lcircIgK}) ^circ{g^}y« 

= -^(circIg^D-VK 

= qK(M)y„, 


(23) 


where includes the first column of the circulant matrix 
(circ{gK})“^ scaled by Due to the fact that the the 

coefficients of the prototype filter are known, the vectors q^’s 
can be calculated offline. Additionally, since the prototype 
filter coefficients are real, q^’s are also real. From ( l23t , one 
may realize that calculation of the vector y needs A number 
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(a) 


(b) 


Fig. 3. Unified implementation of our proposed MF, ZF and MMSE-based GFDM receivers from cascading the block diagrams |(a) | and [(b) | 


of M-point circular convolutions. After acquiring y, the ZF 
estimates of the transmitted symbols can be obtained as 

dzF = (24) 

As can be inferred from (l24l l. finding dzp from y requires M 
number of V-point inverse DFT (IDFT) operations. 

D. Low complexity MMSE receiver 
Using (flSl l in ( fTOl i we get 

dlviMSE = + CT^^lMAr)~^A^y 

= (25) 

where X) = T> + = diag{X>o,..., X>Ar_i} and 

= T>i + Recalling circulant property of 

from (fTTl i. it can be understood that T>i is also circulant and 
can be expanded as + o'i/^Im)Fm where 

= MNdiagiFMgK^- Let y = [yj,..., y^_ = 

X> IFbA^y, we can write 

y, = 

= (26) 

where includes the first column of the circulant matrix 
Fm{(^k^k +cri,^lM)~^^K}FM- Since, in MMSE receiver, 
the matrix X> depends on a,y'^ and the receiver cannot be 
simplified as in (O or (ES, circular convolution of (l26T l 
needs to be calculated in the frequency domain, known as 
fast convolution, in order to have the lowest complexity. After 
obtaining y, the MMSE estimates of the transmitted symbols 
can be found as 

dMMSE = -^^y- (27) 


E. Receiver implementation 

In this subsection, we present a unified implementation 
of the ME, ZE and MMSE receivers that we proposed in 
Sections [IV-BI IIV-^ and IIV-DI As Eig. |3]depicts, the proposed 
GEDM receivers can be implemented by cascading Eig. [3 |(a)| 
and |(b)| It is worth mentioning that the commutator on the 

'since, is a real vector and circularly folded version of g^, = 

MNdiag{FMgK}- 


right hand side of Eig. [3 |(a)| will turn by one position after 
collecting M samples from the branch, i.e., M x 1 
vector yijyijyi, in the clockwise direction. In the ME and 
ZE receivers, the vectors 7 ^ are replaced by v^’s and q^’s, 
respectively, and in MMSE receiver, they will be replaced by 
Pi’s. Due to the fact that in the ME and ZE receivers, the 
vectors and are fixed and only depend on the prototype 
filter coefficients, they can be calculated offline and hence 
there is no need for their real-time calculation. However, in 
MMSE receivers, the vectors pi depend on the signal to noise 
ratio and hence they should be calculated in real-time. As 
mentioned earlier in Section ITV-DI circular convolutions in our 
MMSE receiver need to be performed by taking advantage of 
fast convolution to keep the complexity low. 

V. Computational Complexity 

In this section, the computational complexity of our pro¬ 
posed GEDM transmitter and receiver structures are discussed 
and compared to the existing ones that are known to have the 
lowest complexity, ifTSl . Il20l . In both cases, total number of 
N subcarriers and overlapping factor of M are considered. 


A. Transmitter complexity 

Table |T] presents the computational complexity of different 
GEDM transmitter implementations based on the number of 
complex multiplications (CMs). 

As discussed in Section IIII-BI our proposed GEDM trans¬ 
mitter involves two steps. The first step includes M number of 
A^-point EET operations that requires 2^ log 2 N CMs. The 
second step needs N number of M-point circular convolutions. 
Recalling equation (fTSl l. since g„’s are real-valued vectors, one 
may realize that each M-point circular convolution demands 
^ number of CMs. If M is a power of two, the complexity 
can be further reduced by performing the circular convolutions 
in frequency domain. This is due to the fact that circular convo¬ 
lution in time is multiplication in the frequency domain. Thus, 
to perform each circular convolution, a pair of M-point EET 
and lEET blocks together with M complex multiplications to 
the filter coefficients in frequency domain are required. 

The complexity relationships that are presented in Table U 
are calculated and plotted in Eig. |4] for N = 1024 subcarriers 
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TABLE I 

Computational Complexity of Different GFDM Transmitter 
Implementations 


Technique 

Number of Complex Multiplications 

Direct matrix multiplication 

{MNf 

Proposed transmitter in I20l 

MA(log2 N+ 2 log2 M + L) 

Our proposed transmitter 

^(M + log2 N) 


io‘ 



2 4 6 8 10 12 14 16 18 20 

Block length M 


Fig. 4. Computational complexity comparison of different GFDM transmitter 
techniques and the OFDM transmitter technique for N = 1024. 


with respect to different values of overlapping factor M. As 
the authors of 1201 suggest, L = 2 is chosen for calculating 
their GFDM transmitter complexity. Due to the fact that direct 
multiplication of A to the data vector d demands a large 
number of CMs and is impractical, we do not present it in 
Fig. a To give a quantitative indication of the complexity re¬ 
duction that our proposed transmitter provides compared with 
the direct computation of the equation Q, in the same system 
setting as used for our other comparisons, i.e., N = 1024 
and M G [1,21], complexity reduction of around three orders 
of magnitude can be achieved. According to Fig. a for the 
small values of M our proposed transmitter structure has a 
complexity very close to that of OFDM. However, as M 
increases the complexity of our transmitter increases with a 
higher pace than OFDM. This is due to the overhead of 
number of CMs compared with OFDM. Compared with the 
transmitter structure that we are proposing in this paper, for 
small values of M up to 11, the transmitter proposed in l20l 
demands about two times higher number of CMs. As M 
increases, complexity of our technique gets close to that of 
the one proposed in 1^ . GFDM transmitter of 1201 is about 
3 to 4 times more complex than OFDM. 


B. Receiver complexity 

Table |II] summarizes the computational complexity of dif¬ 
ferent GFDM receivers in terms of the number of complex 
multiplications. The parameter / is the number of iterations 
in the algorithm with interference cancellation. 


From Fig. |3] it can be understood that our proposed 
receivers involve N and M numbers of M-point circular 
convolutions and A^-point IDFT operations, respectively. IDFT 
operations can be efficiently implemented using A^-point IFFT 
algorithm which requires ^ log 2 N CMs. As mentioned ear¬ 
lier, in the proposed MF and ZF receivers, the vectors 7 ^ have 
hxed values and hence can be calculated and stored offline. 
Furthermore, are real-valued vectors. Thus, the number 
of complex multiplications needed for N number of M-point 
circular convolutions is . 

In contrast to the MF and ZF receivers, in the MMSE 
receiver, the vectors 7 j’s are not hxed and depend on the 
signal-to-noise ratio (SNR). Hence, they need to be calculated 
in real-time. To this end, as highlighted in Section IIV-DI 
those operations can be performed by using M-point DFT and 
IDFT operations. Due to the fact that ($* $„ -f a^^lM) is a 
real-valued diagonal matrix, its inversion and multiplication 
to only needs ^ CMs. The resulting diagonal matrix 
is multiplied into an M x 1 vector 
which needs M CMs. Since, M is not necessarily a power 
of 2, complexity of M-point DFT and IDFT operations in the 
implementation of the circular convolutions is considered as 
M^. Obviously, if M is a power of 2, a further complexity 
reduction by taking advantage of FFT and IFFT algorithms is 
possible. Therefore, the complexity of our proposed MMSE 
receiver only differs from the ME and ZE ones in the imple¬ 
mentation of the circular convolution operations. 

Table |II] also presents the complexity of the direct ME, ZE 
and MMSE detection techniques, i.e., direct matrix multiplica¬ 
tions and solutions to the equations (|9|l and (fTOl) . respectively. 
Those solutions involve direct inversion of an MN x MN 
matrix which has the complexity of 0{M^N^) and two vector 
by matrix multiplications with the computational burden of 


2{MNf CMs. 

The complexity formulas that are presented in Table HI] 
are evaluated and plotted in Eig. |5] for different values of 
overlapping factor M G [1,21], N = 1024 and / = 8 for 
the receiver that is proposed in na. Based on the results of 
m, 1 = 8 and L = 2 are considered. Due to the fact that the 
complexity of ME, ZE and MMSE receivers with direct matrix 
inversion and multiplications is prohibitively high compared 
with other techniques (the difference is in the level of orders 
of magnitude), they are not presented in Eig. |5] However, to 
quantify the amount of complexity reduction that our proposed 
techniques provide, in the case of A^ = 1024 and M = 7, 
our proposed ME/ZE receiver is three orders of magnitude 
and the proposed MMSE receiver is six orders of magnitudes 
simpler than the direct ones, respectively, in terms of the 
required number of CMs. As Eig. |5] depicts, our proposed 
ZE receiver is around an order of magnitude simpler than the 
proposed receiver with SIC in [jTsl. In addition, our proposed 
MMSE receiver has 2 to 3 times lower complexity than the 
one in m. Apart from lower computational cost compared 
with the existing receiver structures, our techniques maintain 
the optimal ZE and MMSE performance as they are direct. 
Einally, the ZE and MMSE receivers that we are proposing are 
closer in complexity to OEDM as compared to the receiver in 
m which is over an order of magnitude more complex than 













































TABLE II 

Computational Complexity of Different GFDM Receiver 
Techniques 


Technique 

Number of Complex Multiplications 

Direct ZF 

2(MNf 

Direct MMSE 

\(MNf +2(MNf 

Matched filter + SIC, 1181 

AUV(log2 MN + logj M + L + 1(2 log^ M + 1)) 

Proposed MF/ZF 

^(Af + log^V) 

Proposed MMSE 

Mf(iM + \og^N + -i) 



Fig. 5. Computational complexity comparison of different GFDM receiver 
techniques with respe ct t o each other and that of OFDM receiver when N = 
1024 and 7 = 8 for fTSl . 

OFDM. 

VI. Conclusion 

In this paper, we proposed low complexity transceiver 
techniques for GFDM systems. The proposed transceiver 
techniques exploit the special structure of the modulation 
matrix to reduce the computational cost without incurring 
any performance loss penalty. In our proposed transmitter, 
block DFT and IDFT matrices were used to make the mod¬ 
ulation matrix sparse and hence reduce the computational 
burden. We designed low complexity MF, ZF and MMSE 
receivers by block diagonalization of the matrices involved 
in demodulation. It was shown that through this block di¬ 
agonalization, a substantial amount of complexity reduction 
in the matrix inversion and multiplication operations can be 
achieved. A unihed receiver structure based on MF, ZF and 
MMSE criteria was derived. The closed form expressions for 
the ZE and MMSE receiver biters were also obtained. We 
also analyzed and compared the computational complexities 
of our techniques with the existing ones known so far to have 
the lowest complexity. We have shown that all the proposed 
techniques in this paper involve lower computational cost 
than the existing low complexity techniques mSl, a. Eor 
instance, over an order of magnitude complexity reduction 
can be achieved through our ZE receiver compared with the 
proposed technique in lIT^ . Such a substantial reduction in the 
amount of computations that are involved makes our proposed 


transceiver structures attractive for hardware implementation 
of the real time GEDM systems. 

Appendix A 
Derivation of 

The key idea in the derivation of is based on the 

fact that inner product of two complex exponential signals 
with different frequencies is zero. 

N-l 

^ (A.l) 

i=0 

Erom the debnitions of and A, F = F’f,A^ can be 
obtained as F = [fJ, ..., F^_i]'^ where F^’s are M x MN 
block matrices that can be mathematically shown as 

N-l 

F, = —g^ Y. ^a.2) 

e=o 

where . Based on the debnition of Si and (lA.ll) 

we have 

N-l 

Y (A.3) 

7=0 

where k = {N — i) mod N, = diag{[ ..., t/ij 

M block vectors 

tA„ = [0,...,l,...,0]T, 

t 

position 

■0^’s are A x 1 vectors and is a diagonal mattix whose 
main diagonal elements are made up of M concatenated copies 
of the vector From (IA.31 and (lA.ll) . F^’s can be obtained 
as 

F, = (A.4) 

Accordingly, it can be perceived that the block matrices Fi’s 
and hence the matrix F are sparse. The matrix F^ has only 
non-zero elements which are located on the circularly 
equidistant columns k, k+N, ..., k+{M—1)N. The elements 
of two consecutive non-zero columns of F^ are circularly 
shifted copies of each other. For instance, the second non-zero 
column of F^ is a circularly shifted version of the brst non¬ 
zero one by one sample. From (IA.4I) . the brst non-zero column 
of Fi can be derived as '/N[gf,,gK+{M-i)N, ■ ■ ■ 
which is the circularly folded version of the polyphase 
component of the prototype biter. One can further deduce that 
the matrix F is a real one consisted of the prototype biter 
coefficients. 

Appendix B 

Closed Form Derivation of T> 

The polyphase components of the prototype biter g can 
be debned as the vectors go, gi,..., gw-i where gi = 
[gi,gi+N, ■ ■ ■ ,gi+(M-i)NV- As it is shown in Appendix 
F = F’fjA^ is a sparse matrix with only M non-zero elements 
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in each column. The elements of F can be mathematically 
represented as 


f 

\/N[in']k, n = kM,...,{k + 1)M-1, 


[r]™ = < 


0 


n' = i mod N, 

k = [n + M — ) mod M, 


otherwise, 


(B.l) 


where g„/ is circularly folded version of and k = 
(TV — i) mod N. From dB.ll l. it can be deduced that each 
group of M consecutive rows of F, i.e., F^’s, whose non¬ 
zero elements are comprised of the elements of the vectors 
g„'’s, is mutually orthogonal to the other ones. This is due to 
the fact that the sets of column indices of F^’s with non-zero 
elements are mutually exclusive with respect to each other. 
The block-diagonal matrix X>, as derived earlier in ([Tdl l. can 
be calculated as X> = which can be rearranged 

as X> = (:Ft,AH)(:FbAH)H = rF“. 

Due to orthogonality of F^’s with respect to each other, 
i.e., FiF^ = Om, i ^ j, it can be discerned that X> 
has a block-diagonal structure. Based on equation (ED, 
only equidistant columns of F^’s with circular distance of 
N are non-zero and two consecutive and non-zero columns 
are circularly shifted copies of each other with one sample. 
As a case in point, consider Fq and (ED- Therefore, the 
elements [Fo]oo = \/iV[go]o, [ro](M-i)o = /^[go](M-i) 
and [Fo]q7v = VN[go\{M-i), [r'o](M-i)Ar = V^[go](M-2) 
illustrate that the consecutive and non-zero columns of Fq are 
circularly shifted versions of each other. Using (IB. Il l, one can 
conclude that the same property holds for the other non-zero 
columns of Fq and all the other F^’s. 

The goal here is to derive a closed form for X>. 


T> = FF^ 



F 


H 

N-1 


(B.l) 


Tat-i 


V is an MN X MN matrix comprised of M x M submatrices 
which are all zero except the ones located on the main 
diagonal, i.e., X>i = F^Ff. From (IB.lb . it can be understood 
that the first non-zero columns of the matrices F^ and Ff are 
equal to v/Agu and v/Agw, respectively and the rest of their 
non-zero columns are circularly shifted version of their first 
non-zero column. Removing zero columns of F^’s 

V, = F,Ff = f ,ff, (B.3) 

where and f are circulant matrices with the first columns 
equal to v/TVgft and v/TVgft, respectively. Since, F^ and Fj 
are real and circulant, is also a real and circulant matrix 
which can be obtained as 


X>i = NciTc{gi^@in}. (B.4) 
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